Introduction to version control with Git

Scientific workflows: Tools and Tips 🛠️

6/15/23

What is this lecture series?

Scientific workflows: Tools and Tips 🛠️

📅 Every 3rd Thursday 🕓 4-5 p.m. 📍 Webex

By Jorge Cham from PhD Comics

Version control

Requirements

  • Complete and long-term history of every file in your project

  • Safe (e.g. no accidental loss of versions)

  • Easy to use

  • Overview and documentation of all changes

  • Collaboration should be possible

Version control with Git

Version control with Git

  • Open source and free to use version control software

  • Quasi standard for software development

    • Optimized for text files (e.g. code, …)
  • A whole universe of other software and services around it

Today

  • Basic concepts of Git
  • A simple workflow in theory and practice
  • A small outlook on more advanced features

Version control with Git

  • For projects with mainly text files (e.g. code, markdown files, …)

  • Basic idea: Take snapshots of your project over time

    • Git registers the changes that you make to your projec
  • Information on the version history etc. stored in a special .git folder

Version control with Git

Git is a distributed version control system

  • Idea: many local repositories synced via one remote repo
  • Every computer has full-fledged version of repository with entire history

How to use Git

After you installed it there are different ways to use the software for you projects

How to use Git - Terminal

Using Git from the terminal

Gives you most control

You find a lot of help online

You need to use the terminal

How to use Git - GUIs

A Git GUI is integrated in most (all?) IDEs, e.g. R Studio

(Often) Easy and intuitive

Stay inside your IDE

Not universal

How to use Git - GUIs

Standalone Git GUI software, e.g. Github Desktop

Easy and intuitive

Helps with initial setup of Git

Nice integration with Github

Switch program to use Git

How to use Git

Which one to choose?

  • Depends on your prior experience and taste
  • Depends on your IDE
  • If you never used the terminal before, I recommend to start with Github Desktop
    • But in the long run, it’s definitely worth it looking into the terminal
  • You can also mix methods and freely switch between them

The basic Git workflow

git init, git add, git commit, git push

Starting situation

  • Local working directory (i.e. a folder) with or without files

Step 1: Initialize a git repository

  • Add a (hidden) .git folder to your project that will contain the Git repository
  • You don’t have to touch anything that is in this folder

Step 2: Modify files and stage changes

Git detects any changes in the working directory

Git shows you all the changes you did to the files.

Step 2: Modify files and stage changes

When you want a file to be part of the next commit (i.e. snapshot), you have to stage the file

  • In the terminal use git add
  • Usually in Git GUIs this is just a check mark next to the file name

Step 3: Commit changes

  • Commits are the snapshots of your project states

  • Commit work from staging area to local repository

    • Collect meaningful chunks of work in the staging area, then commit

  • After a commit, your changes are part of your project’s git history

Commit messages

  • Every commit has a unique identifier (so-called hash)
    • You can use this hash to come back to the version
  • Every commit has a commit message
  • Good commit messages are essential to make good use of git!

A good commit message has a subject line and a body, e.g.

Add function to calculate temperature

This is the body where you can explain in detail which changes you did
and why you did them. Then your colleages (and you from the future) immediately
understand why.

Step 4: Create and connect a remote repo

  • Remote repositories are on a server and can be used to synchronize, share and collaborate

  • Remote repositories can be private (only for you and selected collaborators) or public (visible to anyone online)

Step 5: Share your changes with the remote repo

  • Push your local changes to the remote with git push

Summary

  • git init: Initialize a git repository
    • adds a .git folder to your working directory
  • git add: Add files to the staging area
    • This marks the files as being part of the next commit
  • git commit: Take a snapshot of your current project version
    • Includes a timestamp, a meaningful commit message and information on the person who did the commit
  • git push: Push your newest commits to the remote repository
    • Sync your local project version with the remote e.g. on Github

Share and collaborate

Between machines and/or people

Clone a repo from a remote

  • Get a copy of your own repository on a different machine
  • Get the repository from somebody else

Clone a repo from a remote

By cloning, you get a full copy of the repository and the working directory with all files on your machine.

  • Clone a remote repository with git clone <remote_address>

  • If the clone is authorized it can also commit and push

Get changes from the remote

  • Local changes, publish to remote: git push

  • Remote changes, pull to local: git pull

A simple collaboration workflow



  • By default: Everything on one branch (main)
    • Branches are connections between specific commits

A more complex collaboration workflow

  • You can also have multiple branches of the same project

  • Work on separate branches for
    • Separate people
    • Separate features

A more complex collaboration workflow

Branches can be merged using git merge

Remote repository platforms

The combination of Git and a remote repository platform unlocks a lot of possibilities!

  • Advanced workflow features for collaboration and open-source development
    • Issues and pull requests
  • Publishing and sharing of projects
    • Easily connect with Zenodo to get a DOI
    • Accepted by many journals
  • Additional features
    • Project wikis
    • Project websites

Take home

  • Git (+ Github) is very powerful for coding projects
    • Keep track of your changes and go back if you need to
    • Collaborate and share
    • Have fun
  • Can be confusing in the beginning, but Git GUIs make it intuitive
  • Valuable addition to your toolbox that’s also relevant outside academia

Tips for getting started

Start using it for small projects and discover features as you go along.

Don’t get frustrated by the complexity - it will get better.

Use a GUI if you don’t like the terminal.

Get started

Command line

Follow this Git training for learning the Git concepts in the command line.

R and R Studio

There is a whole book on using Git with R that explains the setup in detail but also goes into more advanced topics.

Follow this step by step guide to set up Git and a Github connection in R and R Studio

Github Desktop

There are detailed step by step guides on how to set up Github Desktop and how to work with in the Github Desktop Documentation

Next lecture

Research compendia with R

A research compendium is a collection of all the digital parts of your research projects (data, code, documents) with the goal of your results being reproducible. You can do this in R by building an R 📦 which makes it easy to publish a fully reproducible version of your project.


📅 20th July 🕓 1-2 p.m. 📍 Webex

🔔 Subscribe to the mailing list

📧 For topic suggestions and/or feedback send me an email

Thank you for your attention :)

Questions?

References

Learn git concepts, not commands: Blogpost that explains really well the concepts of git, also more advanced ones like rebase or cherry-pick.

How to write good commit messages: Blogpost that explains why good commit messages are important and gives 7 rules for writing them.

Git cheat sheet: Always handy if you don’t remember the basic commands

Book on how to use Git wih R